Rating Naturalness in Speech Synthesis: The Effect of Style and Expectation
نویسندگان
چکیده
In this paper we present evidence that speech produced spontaneously in a conversation is considered more natural than read prompts. We also explore the relationship between participants’ expectations of the speech style under evaluation and their actual ratings. In successive listening tests subjects rated the naturalness of either spontaneously produced, read aloud or written sentences, with instructions toward either conversational, reading or general naturalness. It was found that, when presented with spontaneous or read aloud speech, participants consistently rated spontaneous speech more natural even when asked to rate naturalness in the reading case. Presented with only text, participants generally preferred transcriptions of spontaneous utterances, except when asked to evaluate naturalness in terms of reading aloud. This has implications for the application of MOS-scale naturalness ratings in Speech Synthesis, and potentially on the type of data suitable for use both in general TTS, dialogue systems and specifically in Conversational TTS, in which the goal is to reproduce speech as it is produced in a spontaneous conversational setting.
منابع مشابه
Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملمراحل و نحوه ی تهیه ی دادگان های صوتی هجایی و دایفونی برای سامانه ی تبدیل متن به گفتار فارسی
Abstract Speech databases are part of the concatenative text to speech synthesis systems. Phonetic quality of the databases plays a significant role in the naturalness of the synthesized speech. This paper introduces two syllable and diphone speech databases for Persian and investigates the way of their development and their specifications and their advantages to each other. ...
متن کاملPerformance evaluation of style adaptation for hidden semi-Markov model based speech synthesis
This paper describes a style adaptation technique using hidden semi-Markov model (HSMM) based maximum likelihood linear regression (MLLR). The HSMM-based MLLR technique can estimate regression matrices for affine transform of mean vectors of output and state duration distributions which maximize likelihood of adaptation data using EM algorithm. In this study, we apply this adaptation technique ...
متن کاملNative EFL Raters’ Criteria in Assessing the Speech Act of Complaint: The Case of American and British EFL Teachers
Despite the importance of interlanguage pragmatic rating (ILP) in the second language teaching and learning context, scant attention has been devoted to it. This study aims to investigate native EFL teachers’ major criteria in assessing the speech act of complaint produced by Iranian EFL learners. To fulfill this end, two groups of experienced native raters, including American (n=47) and Britis...
متن کاملPhonological Reduction in Swedish
In this paper, the importance of pronunciation variation modelling is discussed. As a first step in developing a model of Swedish pronunciation variation due to speaking style and speech rate, a tentative reduction rule system has been developed. An assessment experiment testing the impact of phonological reduction, as defined by this system, on the perceived naturalness of speech synthesis was...
متن کامل